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Executive Summary 


• Bioinformatics provides a framework for understanding biocomplexity. 

• Texas Tech University has been in the process of developing a bioinformatics program 
since the early 1990s when the term bioinformatics was first used. 

• The Natural Science Research Laborator}' of the Museum of Texas Tech University is 
among the first to develop computer-based catalogues and to use bar-coded tags for 
recording inlbrmation on specimens. 

• d'exas Tech University, in concert with 1 exas Parks and Wildlife Department, initiated a 
computerized Natural Science Database in 1994 as an extension of the computerized 
collections databases developed by the Museum in the early 1980s. 

• The emerging bioinformatics program of Texas Tech University involves the 
multidisciplinar}^ work of at least 58 individuals in 10 departments and 4 colleges. 

• Relational databases bemg developed are compatible with the National Biological In¬ 
formation Infrastructure of the U.S. Geological Survey, Biological Resources Division. 

• Databases and research products are being placed on the World Wide Web at 
<w'ww.tcru.ttu.edu/tcru/> and linked to the Museum web site at <ww^^nsrl.ttu.edu>. 

• Relational databases are being developed to provide organization and structure for 
disparate data such things as mammals, habitats, vegetation, soils, precipitation, 
elevation, and photo-documentation. 

• Society will benefit from data being arranged for visual presentation in geographic 
mformation systems, maps, reports, tables, and graphs, 

• Historic data and archived specimens provide the base against which we measure change 
and identify voids in scientific knowledge. 

• Well-structured relational databases will serve economic development by providing rapid 
information on winch sound decisions may be based. 

• Students trained in developmg relational databases readily find employment in the 
information industry . 

Front cover: Schematic representation of a relational database consisting of multiple databases housed in 
a number of departments at Texas Tech Universit}' and agencies of the State of Texas. Data from any 
one data set— rabies, for example— can be linked in lime and space (Universal Transverse Mercator, 
UTM) to reveal the relationships of mammals which carry rabies to habitat (vegetation, land use, 
crops, pesticides, climate, geology, etc.) and impacts on society (human health and economics). 
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Bioinformatics is a new field of science that uti¬ 
lizes data from the biological sciences and many ancil¬ 
lary disciplines (Fislnnan, 1996; Grace, 1997). A for¬ 
mal definition of bioinfomiatics has been given as “ the 
systematic development and application of computing 
systems and computational solution techniques analyz¬ 
ing data obtained by experiments, modeling, database 
search, and instrumentation regarding biological aspects” 
(<http://www.bioplanet.com/whatis,html>). We have 
broadened and extended this definition to include “the 
delivery of the data and its synthesis to potential users.” 
By this definition, we envision the delivery of data and 
products developed from a synthesis of data to school 
children, educators, landowners, persomiel instate and 
federal agencies, legislators, lawmakers, investors, bank¬ 
ers, and other members of the public. 

Altliough in its infancy, bioinfomiatics is a rapidly 
growing field. In a search of the Internet, seven search 
engines (Excite, Lycos, Infoseek, Yahoo, HotBot, Alta 
Vista, and Northern Lights) identified a 1993 document 
from a 1992 symposium as the first to contain the word 
bioinfonnatics, defined as the management ofbiologi- 
cal infomiation through computer teclinology (Lim et 
ah, 1993). By 1997, there were 843 documents with the 
term bioinformatics identified by these same seven 
search engines. As an indication of the rapid growth of 
interest, the search engine Northern Lights located 
23,685 documents containing the term bioinfomiatics 
in November 1998. The use of this temi has primarily 
been associated with molecular genomics. The Human 
Genome Project, with its goal of detemiiiiing die se¬ 
quence of approximately three billion base pairs, made 
it imperative that computer technology be utilized to 
organize and manage the explosion of genomic infomia¬ 
tion (Collins et ah, 1998). Today, it is possible to search 
the database of DNA sequences of nearly one billion 
nucleotide bases to detemiine the recorded sequences 
diat most closely match any newly sequenced DNA frag¬ 
ment. Such a search typically takes a few minutes and 
frequently only a few seconds. 


In our world of specialists and special interest 
groups, it’s easy to become immersed in the details and 
lose sight of the big picture. We tend to concentrate on 
problems in our own back yard and leave problems in 
other parts of the world to others. Biologist E.O. Wil¬ 
son in his recent book “Consilience: The Unity of Knowl¬ 
edge” presents a strong argument that all knowledge is 
related and that the world operates in accordance with a 
number of fundamental natural laws (Wilson, 1998<^?, 
1998/j). The relatedness of biological life, the intrica¬ 
cies of the environment, and the importance of 
biodiversity has been recognized by otliers. Rita Colwell, 
Director of the National Science Foundation, promotes 
use of the tenn biocomplexity to denote the variety of 
the living planet and our ability to understand its intri¬ 
cately integrated systems of life (Mervis, 1998). Deci¬ 
sions made by society are not made in isolation of the 
natural world, but often are shaped by tlie biological world, 
the flora and fauna, in turn reflecting the chemical and 
physical condition of the environment. Thus, 
bioinfomiatics provides a framework for understanding 
this biocomplexity. 

Bioinfomiatics is relevant to all of us, as we 
all live in the biological world. The data sets upon 
which bioinfomiatics are based help us understand, 
recognize, and address such crucial issues as environ¬ 
mental quality, biodiversity, habitat composition, 
resource allocation, and sustainable development 
opportunities. The organization of biological data and 
ancillary ecological attributesinto relational databases 
provides an order of understanding not available prior 
to the development of bioinfonnatics. It is our view 
that the accessibility and interpretation of these data 
will have an unparalleled effect on the utility of biologi¬ 
cal data not only in the sciences, but in the private 
sector as well. 

Insight from such data provides knowledge about 
the world’s biocomplexity, biodiversity, and maintenance 
of aesthetics, as well as the ability for sustainable plan- 


Parker et al., 1998 , Bioinformatics: A Multidisciplinary Approach for the Like Sciences. 
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ning and economic opportunity. In a world with 1.75 
million species of organisms described and estimates 
ranging between 4 and 40 + million undescribed 
species, extinction has reached an alarming estimated 
rate of 30,000 to 300,000 species per year (Wilson, 


1988). These are species with intrinsic value as tlie “cogs 
in the wheel” (the simple inter-relatedness of organisms 
in the ecosystem) and of unknown potential value as 
a source of medicinal drugs, chemicals, DNA sequences, 
and alternate food crops. 


Existing Programs in Bioinformatics 


Few bioinformatics programs now exist in the 
United States and those that do are devoted almost ex¬ 
clusively to genomic projects. The National Center for 
Genome Resources contains the Genome Sequence da¬ 
tabase, established to increase awareness of 
bioinformatics by medical specialists and sub-special¬ 
ists, The U.S. Department of Agriculture has established 
the Agricultural Genomics Program employing 
bioinfomiatic tecliniques. The U.S. Department of En¬ 
ergy has a bioinformatics division deeply involved with 
the Human Genome Project. Another example in the 
Federal Government is in the Department of Interior 
which has established the National Biological Informa¬ 
tion Infrastructure (NB11), a bioinformatics program of 
another name managed by the U.S. Geological Survey. 
The Presidential Committee of Advisors on Science and 
Teclmology recommended that the Federal Government 
invest a minimum of $40 million per year for the next 5 
years to develop the next generation NBll. At the state 
level, the Southwest Region of the U.S. Fish and Wild¬ 
life Service, in cooperation with the Center for Wildlife 
Law at the University of New Mexico Institute of Public 
Law, has developed an Internet linkage titled ‘the South¬ 
west Roadmap’ (<http://roadmap.unm.edu>) to link mul¬ 
tiple websites of interest to natural resource managers 
and researchers. Several pharmaceutical companies in 
the United States have established departments or divi¬ 
sions with bioinfonnaties as part of the formal name. 
For example, the Immunex Corporation, Seattle, Wash¬ 
ington, has a Department of Bioinfomratics; Floffmann- 
La Roche, Nutley, New Jersey, has the Department of 
Oncology and Bioinformatics, and SmithKline Beechem 
Phamiaceuticals, Collegeville, Pennsylvania, has a divi¬ 
sion of Bioinfonnaties. 

The biomedical community of Europe and much 


of the world seems to have embraced and implemented 
programs employing bioinfonnaties. For example, the 
Max-Dclbriick Center, Berlin, Germany, has establi.shed 
the Department of Bioinformatics, the Swiss Institute 
for Environmental Cancer Research, Lausanne, Switzer¬ 
land, has a Bioinfomiatics Group; the National Univer¬ 
sity of Singapore has established a Bioinfonnaties Cen¬ 
ter in the National University Hospital; and South Africa 
has the National Bioinfonnaties Institute. In Mississauga, 
Ontario, Canada, the Base 4 Bioinformatics Company 
has been established to increase understanding in (1) the 
basic biological processes of the body, (2) ways in wliich 
these processes may malfunction to cause disease, and 
(3) improved dmg discovery and development processes. 

Although it is clear that bioinformatics is a grow¬ 
ing field and that several state, federal, and international 
agencies are beginning to implement programs in this 
area, what is needed is an academic training program tliat 
prepares and trains scientists for these programs (Anony¬ 
mous, 1996; Marshall, 1996a; Marshall, 1996Zi). For 
example, any recent issue of Science contains job va¬ 
cancy announcements for six to twelve positions in 
bioinformatics. In Science 281 (5385), the University 
of Missouri, Clemson University, and the University of 
Georgia advertized for bioinformatics positions to work 
in maize genomics. In the same issue, the National Can¬ 
cer Institute advertized for a database administrator in 
the Office of Informatics and the University of Denmark 
advertized five positions in bioinformatics. Qualifica¬ 
tions required for these jobs typically include back¬ 
grounds in biology and computer sciences, knowledge 
of Oracle and other database systems, skills in computer 
techniques, telecommunications, mapping techniques, 
development of world wide web pages, and the ability to 
communicate effectively with diverse users of the data. 


Current Activities at Texas Tech University 


The basis of a bioinformatics program at Texas Tech University has been developing for a number of 
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years. A 1996 publication with 23 authors from 13 dif¬ 
ferent sectors (departments, universities, state, and fed¬ 
eral agencies) outlined the critical need for a natural sci¬ 
ence database and its value to resource management and 
public health (Baker etal., 1996). In September of 1998, 
a presentation on bioinformatics was included as part of 
the progT-am of the Taxonomy Database Working Group 
in Reading, England (Parker et al., 1998). These two 
publications represented the combined efforts of doz¬ 
ens of students, staff, and faculty. For example, in addi¬ 
tion to the multidisciplinary author lines, these publica¬ 
tions represent the compiled work of at least 58 indi¬ 
viduals in 10 departments and 4 colleges at Texas Tech 
University (see acknowledgments), all of whom were 
necessary to solve tlie questions at hand. 

At Texas Tech University, we planned to create a 
relational database which would extend across the Uni¬ 
versity to many colleges and departments. The Museum, 
thr ough the Natural Science Research Laboratory, in part¬ 
nership with the Texas Cooperative Fish and Wildlife 
Rescai'ch Unit (a component of U.S. Geological Survey’s 
Biological Resources Division) and Texas Parks and 
Wildlife Department, have taken the lead in initiating this 
program. Tin's paitnersliip program has state (Texas GAP; 
Parker etal., in press), national (National GAP Analysis 
Program; Scott et al.., 1993), and mteniational (Rio 
Grande GAP, headquartered at TTU; Gonzalez-Rebeles 
et al., 1997) components. To ser\'e the components, we 
have acquired the database Oracle to provide the data 
structure for several databases currently residing in the 
Museum and in tlie GAP lab. Data to be placed in the 
Oracle database and selectively made available through 
the Web include vegetation maps for each county in 
Texas, maps of vertebrate distributions in Texas, records 
firom the mammal collection, text, photos, and maps from 
The Mammals of Texas (Davis and Schmidly, 1994), and 
climate data from 3,860 sites in Texas, some with daily 
records spamiing over 100 years. 

A doctoral student currently working on the his¬ 
torical distribution and population size of scaled quail is 
using vegetation maps of Texas-GAP and other ancillary 
data to quantify changes in habitat with the population 
size of scaled quail. The two data sets included in this 
study are climate and tlie Breeding Bird Survey. The cli¬ 
mate data are from 3,860 sites in Texas with daily records 
of five to ten variables including average, maximum, and 
minimum temperature, rainfall, and snow. Some of these 


records span over 100 years. These data, purchased by 
Civil Engineering and used for their research, were in 
multiple files on a CD. Each data point was stored as a 
separate file on the CD. For 12 months, students in the 
TX-GAP lab opened tliese files and transferred data point- 
by-point into a relational database allowing analysis of 
these datain a geographical information system (GIS). 
These data are now in a format that can be used in the 
GAP lab and by others for spatial analysis. 

Tlie Breeding Bird Survey is maintained by the Bio¬ 
logical Resources Division of the U.S. Geological Sur¬ 
vey (fomierly by tlie LfS. Fish and Wildlife Service). Tlie 
data for Texas are available on five microfilm tapes and 
cover the years 1967tlirough 1993. The 1994 through 
1996 data are available in hardcopy format. The 1997 
data arc being computerized by the national office of the 
Breeding Bird Survey. The time estimate for transfer¬ 
ring the 1967 through 1993 data for Texas into com¬ 
puter files is 18 months at 40 hours per week. The stu¬ 
dent needing these data will be able to computerize only 
a small sample of the Breeding Bird Survey for Texas. 
However, once these selected data are in a relational da¬ 
tabase, we expect to add other sites, and years, as com¬ 
ponents of new projects. We also are negotiating with 
the Breeding Bird Survey office to secure funding to as¬ 
sist in computerizing these files. 

The aforementioned database is being developed 
for broad-scale, as well as specific application use by 
Texas Parks and Wildlife Department as the scientific 
basis fora natural resource management plan. The plan 
is to collect and properly archive samples of the verte¬ 
brate forms so that a detailed and accurate record will 
be available for future questions concerning the 
distribution and abundance of species at the beginning 
of the 2E‘ centuiy. The geographic localities being 
sampled are the properties administered by TPWD, 
Although these properties are not randomly distributed 
across the state, there is representation of most ecologi¬ 
cal regions habitat types. In the archives of the NSRL, 
tlierc are samples of properly prepared and documented 
classical museum specimens as well cryopreserved 
samples of liver, heart, kidney, muscle, spleen, and 
lungs. At this point we know that each specimen 
contains vast amounts of information that can be 
valuable to science and society related to topics such as 
genetics (especially DNA sequences), conservation 
genetics, disease, levels of contamination of tissues 
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from pollution, and presence of pathogens and viruses. 
We realize that this biological infonnation is critical 
for implementing wise decisions; however, what we 
don’t know is how much new technology will allow us to 
study relationships which have not been conceived. 


If the technological advances of the next century or 
the next 25 years match the last century or even the last 
quarter century, then the exti’actable knowledge will 
experience an exponential growth rate (Com and 
Horrigan, 1984). 


The Vision 


Our goal is to develop a relational database at Texas 
Tech University with spatial data including infomiation 
about ecoregions (Bailey et ah, 1994), climate, soils, 
distribution of vertebrates, vegetation, crops, pesticide 
application, field notes, photos, maps, land-use, land- 
cover, parks, public properties, and wildlife management 
areas. Electronic databases also are planned to include 
3-D digital images of essentially all objects in museum 
collections, including items such as skulls and arrow¬ 
heads and using tools now available in CAD-CAM pro¬ 
grams, these images could be rotated and viewed from 
any angle. Features undetectable with the unaided eye 
could be computer-enhanced to allow analysis and com¬ 
parisons that could not previously be made, even when 
holding the real object in hand. Similarly, in applica¬ 
tions extending beyond the Museum 3-D molecular mod¬ 
els of DNA, proteins, and other biochemicals would aid 
in development of drugs and understanding of biological 
processes. Interactions of antigens and antibodies, hor¬ 
mones and receptor sites, and actions of endocrine 
disruptors would be enhanced by 3-D images that could 
be viewed from all angles as holograms are viewed. 

llie 3-D teclmolog}' now available in the entertain¬ 
ment field wdll soon be available as a tool for science. 
Native American pictographs or rock drawings are now 
stored as 3 -D images at the University of Arkansas. Other 
museums also may be using similar technology for a va¬ 
riety of collections. Improvements in computer hard- 


wai*e, software, andhitemet transmissions (Miller, 1998; 
Seifa 1996; Williams, 1997) will make this an attractive 
method for presenting infomiation on museum collec¬ 
tions to viewers around the world. The University of 
Illinois at Chicago has developed a Cave Automated Vir¬ 
tual Environment (CAVE); a virtual reality environment 
in which the viewer can walk around and interact with 
visual images. Home entertainment systems of the fu¬ 
ture will likely include surround video along with sur¬ 
round sound. This virtual image technology could also 
be used in the medical sciences to view molecular struc¬ 
ture, dmgs and receptor sites, MRl images, and provide 
visual evaluation of prostheses to be placed in the body 
or to view organs prior to surgery. IBM’s new Pacific 
Blue supercomputer, today’s fastest, with speeds of 3.9 
trillion calculations per second, will allow image analy¬ 
sis and processing never before imagined — and most 
likely the world’s fastest computer today undoubtedly 
will be far too slow for tomorrow’s needs. With state- 
of-tlie-art technology, we envision development ofCDs 
and other storage media to contain selected datasets tliat 
will serve schools, businesses, and the public at sites not 
connected to the Internet, or at sites where hardware lim¬ 
its the speed of Internet accessability. We envision the 
development and delivery of products to be an on-going 
process to keep abreast of teclmological advances. The 
storage media of today will probably soon become ob¬ 
solete as society demands faster and more user-friendly 
data synthe.sis and delivery. 


TheVai.ue 


A robust relational database would provide struc- 
tme for data now available and expanding at an unprec¬ 
edented rate. For example, the gross state product for 
Texas was $372 billion in 1990, and in 1995 the con¬ 
struction industiy was valued at $91 billion (Ramos and 
Plocheck, 1995). In 1994, agriculture products at the 
farm and ranch gate level were valued at $ 13 billion, and 
at S42 billion for the collective agricultural industries. 


All of these industries are regulated at the state and fed¬ 
eral level in some manner to control, for example, air 
pollution, soil erosion, effluents, and site selection for 
specific projects. Data required for fanners, develop¬ 
ers, and other businesses to complete permit applica¬ 
tions include site surveys for threatened and endangered 
species, evaluation of wetlands status, potential for soil 
erosion, location of aquifers, groundwater recharge 
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zones, and location of existing wells for water, oil, and 
gas. The location of hazardous waste sites, landfills, and 
the storage sites for toxic material must be known be¬ 
fore many farming, development or even conservation 
projects can be pennitted. 

A relational database that brings together many data 
sets would be of great value to society. For example, a 
database providing the information needed to complete 
the applications for the permits required in the construc¬ 
tion industiy alone could save the industry $1 million 
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per day in interest if only one tenth of one percent of the 
industry could reduce permit application time by 1 day 
[$10 billion x 0.01 (fraction of industry benefitting) x 
0.10 (interest rate) = SI million]. Similar savings in 
the agricultural industry would save over $4 million per 
day. No dollar amount has been assigned to the potential 
value of such a database containing information about 
distribution of rabies, hantavirus, and other emerging 
viruses. The expected value of such a database to pro¬ 
tect environmental quality and public health would 
be immense. 


SUMMAIIY 


The next millemiium has been labeled the infor¬ 
mation age and scientific data must be readily available 
as tire basis for decisions affecting our growing society. 
Advancements in conun unications, computers, remote 
sensing, and instantaneous analysis are rapidly pushing 
science to new fields and forcing a merger of fields 
across interdisciplinary lines. For example, as human 
health is placed at risk by pollution and serious diseases 
such as HIV, Ebola, and other emerging viruses, the so¬ 
lutions are not to be found in one discipline of science, 
but in a synthesis from multiple disciplines. 

We have established a bioinformatics program at 
Texas Tech University that involves the Museum and other 
multidisciplinaiy fields and programs, such as the Texas 
GAP program and tire Department of Biological Sciences. 
One goal of this program is to build a relational data¬ 
base, as a linkage of distributed databases, which can be 
accessed through the Internet. We are using Oracle as 
the primary' relational database with specific existing 
applications running in Visual FoxPro, MS Access, and 
Excel. Examples of the data now available include ver¬ 
tebrate collections from the Museum; field notes, pho¬ 


tos and records of the 1895-1906 biological survey of 
Texas; routes of die Breeding Bird Sun'ey ofTexas con¬ 
ducted annually by the U.S. Fish and Wildlife Service; 
daily weather data from 3,860 sites in Texas with some 
records extending back over 100 years; current landscape 
photographs of habitats in Texas; soil maps for Texas; 
vegetation in Texas; aerial videography; digital elevation 
maps; and Landsat TM scenes for all ofTexas and the 
North American Trade Zone, along the border with 
Mexico, established by the North American Free Trade 
Agreement. 

Several products have been produced to date and 
placed on the Internet. Applications include soil maps 
by agriculture crop type, design and placement of con¬ 
structed wetlands for extraction of nutrients from the 
effluent of cattle feedlots, integration of aquaculture witli 
traditional agriculture, distribution models for vertebrate 
species in Texas, and identification of areas with high 
levels of biodiversity. These and other pro jects will al¬ 
low researchers, resource managers, landowners, school 
children, and the general public to use the best available 
data in tlieir livelilioods, projects, research, and decisions. 
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